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ABSTRACT 


Pilot “complacency” has been implicated as a contributing factor in numerous aviation accidents 
and incidents. Complacency has been defined as self-satisfaction that can result in non-vigilance 
based on an unjustified assumption of satisfactory system state. The term has become more 
prominent with the increase in automation technology in modem cockpits and, therefore, research 
has been focused on understanding the factors that may mitigate its effect on pilot-automation 
interaction. The study examined self-efficacy, or self-confidence, of supervisory monitoring and 
vigilance performance and the relationship of complacency and strategy of pilot use of 
automation for workload management. The results showed that self-efficacy is a “double-edged” 
sword by reducing the potential for automation-induced complacency while limiting workload 
management strategies and increasing other hazardous states of awareness. 



iv 



TABLE OF CONTENTS 


Abstract iii 

Table of Contents v 

Introduction 1 

Automation- Induced Complacency 2 

Trust in Automation and Complacency 4 

Self-Efficacy and Complacency 4 

Objectives of Present Study 5 

Method 6 

Participants 6 

Baseline Task 6 

Experimental Task 7 

Tracking Task 7 

System Monitoring 7 

Fuel Management 8 

Experimental Design 9 

Dependent Measures 10 

RMSE 10 

A’ 10 

NASA-TLX 10 

Experimental Procedure 10 

Results 11 

A’ 11 

RMSE 12 

NASA Task Load Index 12 

Discussion 13 

References 15 


v 



INTRODUCTION 


Automation refers to "... systems or methods in which many of the processes of 
production are automatically performed or controlled by autonomous machines or electronic 
devices (Billings, 1997, p. 7).” Billings stated that automation is a tool, or resource, that allows 
the user to perform some task that would be difficult or impossible to do without the help of 
machines. Therefore, automation can be conceptualized as a process of substituting some device 
or machine for a human activity. (Parsons, 1985). The dramatic increase in technology has 
significantly impacted all aspects of our daily lives. The Industrial Revolution ushered in an era 
of untold innovation that has not only made life easier and safer, but has also provided much 
more leisure time. One need only imagine washing one’s clothes on a washing board, something 
considered an innovation during the early 1900’s, to see how automation has transformed how we 
see ourselves and our place in the world. Automation has become so pervasive that many devices 
and machines are not even considered by most people to be “automated” anymore. Others, such 
as the modem airplane, however, do not escape visibility so easily. Wiener and Curry (1980), 
and Wiener (1989) noted that avionics has provided not only a dramatic increase in airline 
capacity and productivity coupled with a decrease in manual workload and fatigue, but also more 
precise handling, relief from certain routine operations, and more economical use of airplanes. 
Unlike the washing machine, the increased automation in airplanes and air navigational systems, 
however, has not developed without costs. 

The invention of the transistor in 1947 and the subsequent miniaturization of computer 
components have enabled widespread implementation of automation technology to almost all 
aspects of flight. The period since 1970 has witnessed an explosion in aviation automation 
technology. The result has been a significant decrease in the number of aviation incidents and 
accidents. However, there has also been a corresponding increase in the number of errors caused 
by human-automation interaction; in other words, those caused by “pilot error.” In 1 989, the Air 
Transport Association of America (ATA) established a task force to examine the impact of 
automation on aviation safety. The conclusion was that, 

“During the 1970s and early 1980s. ..the concept of automating as much as 
possible was considered appropriate. The expected benefits were a reduction in 
pilot workload and increased safety... Although many of these benefits have been 
realized, serious questions have arisen and incidents/accidents have occurred 
which question the underlying assumption that the maximum available 
automation is ALWAYS appropriate or that we understand how to design 
automated systems so that they are fully compatible with the capabilities and 
l imitations of the humans in the system” (Billings, 1997 p. 4). 

The August 16, 1987 accident at Detroit Metro airport of a Northwest Airline DC9-82 
provides an example of how automation has transformed the role of pilots. The airplane crashed 
just after take-off en route to Phoenix. The airplane began rotation at 1,200 ft from the end of the 
8,500 ft mnway, when its wings rolled to the left and then to the right. The wings collided with a 
light pole located 0.5 miles beyond the end of the runway. One hundred and fifty-four people 
died in the crash with only one survivor. 

For a plane to be properly configured for take-off, the flaps and slats on the wings must 
be fully extended. The National Transportation Safety Board (NTSB) report attributed the 
accident to the non-use of the taxi checklist to insure that the flap and slats of the wings were 
extended. The take-off warning system was cited as a contributing factor because it was not 
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functioning and failed to warn the crew that the plane was not ready for take-off. The airplane’s 
stall protection system announces a stall and will perform a stick pusher maneuver to correct for 
the problem. However, autoslat extension and poststall recovery are disabled if slats are 
retracted. In addition, the tone and voice warning of the stall protection system are automatically 
disabled in flight by nose gear extension (Billings, 1997: NTSB, 1998). Originally, pilots 
manually extended the flaps and slats, performed any maneuvering needed if a stall did occur 
with the airplane, and were responsible for the various other tasks needed for take-off. Due to the 
increase in automation of the cockpit, however, they now depend on the automation to perform 
the pre-flight tasks reliably and without incident. Pilots have now been delegated to the passive 
role of monitoring the automation and are to interfere in its processes only in emergency 
situations. 

The example above illustrates a concept known as “hazardous states of awareness” 
(HSA; Pope & Bogart, 1992). Pope and Bogart coined the term to refer to phenomenological 
experiences, such as daydreaming, “spacing out” from boredom, or “tunneling” of attention, 
reported in aviation safety incident reports. Hazardous states of awareness such as preoccupation, 
complacency, and excessive absorption in a task, and the associated task disengagement have 
been implicated in operator errors of omission and neglect with automated systems (Byrne & 
Parasuraman, 1996). The 1987 Detroit accident was caused partly by the crew’s complacent 
reliance on the airplane’s automation to configure for take-off and failure to confirm the 
configuration with the use of the taxi checklist (Billings, 1997). 

Automation-Induced Complacency 

Wiener (1981) defined complacency as “a psychological state characterized by a low 
index of suspicion.” Billings, Lauber, Funkhouser, Lyman, and Huff (1976), in the Aviation 
Safety Reporting System (ASRS) coding manual, defined it as “self-satisfaction, which may 
result in non-vigilance based on an unjustified assumption of satisfactory system state.” The 
condition is surmised to result when working in highly reliable automated environments in which 
the operator serves as a supervisory controller, monitoring system states for the occasional 
automation failure. It is exhibited as a false sense of security, which the operator develops while 
working with highly reliable automation; however, no machine is perfect and can fail without 
warning. Studies and ASRS reports have shown that automation-induced complacency can have 
negative performance effects on an operator’s monitoring of automated systems (Parasuraman, 
Molloy, & Singh, 1993). 

Although researchers agree that complacency continues to be a serious problem, little 
consensus exists as to what complacency is and the best methods for measuring it. Nevertheless, 
after considering the frequency with which the term “complacency” is encountered in the ASRS 
and analyses of aviation accidents, Wiener (1981) proposed that research begin on the construct 
of complacency so that effective countermeasures could be developed. 

One of the first empirical studies on complacency was Thackray and Touchstone (1989) 
who asked participants to perform a simulated ATC task either with or without the help of an 
automated aid. The aid provided advisory messages to help resolve potential aircraft-to-aircraft 
collisions. The automation failed twice per session, once early and another time late during the 2- 
hr experimental session. These researchers reasoned that complacency should be evident and, 
therefore, participants would fail to detect the failures of the ATC task due to the highly reliable 
nature of the automated aid. However, although participants were slower to respond to the initial 
failure, reaction times were faster to the second automated failure. 

Parasuraman, Molloy and Singh (1993) reasoned that participants in the Thackray and 
Touchstone (1989) experiment did not experience complacency because of the relatively short 
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experimental session and because the participants performed a single monitoring task. ASRS 
reports involving complacency have revealed that it is most likely to develop under conditions in 
which the pilot is responsible for performing many functions, not just monitoring the automation 
involved. Parasuraman et al. (1993) suggested that in multi-task environments, such as an 
airplane cockpit, characteristics of the automated systems, such as reliability and consistency, 
dictate how well the pilot detects and responds to automation failures. Langer (1989) developed 
the concept of premature cognitive commitment to help clarify the etiology of automation- 
induced complacency. According to Langer, 

When we accept an impression or piece of information at face value, with no 
reason to think critically about it, perhaps because it is irrelevant, that impression 
settles unobtrusively into our minds until a similar signal from the outside world 
- such as a sight or sound - calls it up again. At that next time it may no longer 
be irrelevant, most of us don’t reconsider what we mindlessly accepted earlier. 

Premature cognitive commitment develops when a person initially encounters a stimulus, device, 
or event in a particular context; this attitude or perception is then reinforced when the stimulus is 
re-encountered in the same way. Langer (1989) identified a number of antecedent conditions that 
produce this attitude, including routine, repetitious, and extremes of workload; these are all 
conditions present in today’s automated cockpit. Therefore, automation that is consistent and 
reliable is more likely to produce conditions in multi-task environments that are susceptible to 
fostering complacency, compared to automation of variable reliability. 

Parasuraman, Molloy and Singh (1993) examined the effects of variations in reliability 
and consistency on user monitoring of automation failures. Participants were asked to perform a 
manual tracking, fuel management, and system-monitoring task for four 30-minute sessions. The 
automation reliability of the system-monitoring task was defined as the percentage of automation 
failures that were corrected by the automated system. Participants were randomly assigned to one 
of three automation reliability groups, which included: constant at a low (56.25%) or high 
(87.5%) level or a variable condition in which the reliability alternated between high and low 
every ten minutes during the experimental session. Participants exhibited significantly poorer 
performance using the system-monitoring task under the constant-reliability conditions than 
under the variable-reliability condition. There were no significant differences between the 
detection rates of the participants who initially monitored under high reliability versus those who 
initially monitored under low reliability. Furthermore, evidence of automation-induced 
complacency was witnessed after only 20 minutes of performing the tasks. Parasuraman et al. 
(1993) therefore concluded that the consistency of performance of the automation was the major 
influencing factor in the onset of complacency regardless of the level of automation reliability. 

Singh, Molloy, and Parasuraman (1997) replicated these results in a similar experiment, 
which examined whether having an automated task centrally located would improve monitoring 
performance during a flight-simulation task. The automation reliability for the system- 
monitoring task was constant at 87.5% for half the participants and variable (alternating between 
56.25% and 87.5%) for the other half. The low constant group was not used in this study because 
participants in previous studies were found to perform equally poorly in both constant reliability 
conditions. A constant high level of reliability was used instead because complacency is believed 
to most likely occur when an operator is supervising automation that he or she perceives to be 
highly reliable (Parasuraman et al., 1993). Singh and his colleagues found the monitoring of 
automation failure to be inefficient when reliability of the automation was constant but not when 
it was variable, and that locating the task in the center of the computer screen could not prevent 
these failures. These results indicate that the automation-induced complacency effect discovered 
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by Parasuraman et al., is a relatively robust phenomenon, which is applicable to a wide variety of 
automation reliability schedules. The poor performance in the constant-reliability conditions in 
both research studies may be a result of the participant’s premature cognitive commitment or 
perceived trust in the automation to correct for system failures. 

Trust in Automation and Complacency 

Automation reliability and consistency have been shown to impart trust and confidence in 
automation (Lee & Moray, 1994; Muir, 1987; Muir & Moray, 1996). Muir (1994) defines trust in 
human-machine relationships as, ‘‘Trust (T) being a composite of three perceived expectations: 
the fundamental expectation of persistence (P); technically competent performance (TCP) which 
includes skill-, rule-, and knowledge- based behaviors, as well as reliability and validity of a 
referent (machine); and to fiduciary responsibility (FR) of the automation.” 

The specific expectation of technically competent role performance is the defining 
feature of trust between humans and machines. Barber (1983) identified three types of technical 
competence one may expect from another person or a machine: expert knowledge, technical 
facility, and everyday routine performance. Muir (1987) suggests that a human’s trust in a 
machine is a dynamic expectation that undergoes predictable changes as a result of experience 
with the system. In early experiences a person will base his or her trust upon the predictability of 
the machine’s recurrent behaviors. Automation reliability may instill trust and confidence in the 
automated system. However, trust in the automation often declines after an automation 
malfunction or failure, but will recover and increase as long as there are no further malfunctions. 
Therefore, long periods without failure also may foster poor monitoring of the automation (Lee & 
Moray, 1992; Riley, 1989). 

Sheridan and Farrell (1974) first expressed concern about the changing roles in the modem 
cockpit, in which the role of a pilot changed to a supervisory controller of automation 
significantly transforming pilot-machine interaction. Muir (1989) confirmed these concerns and 
demonstrated that participants could discriminate between unreliable and reliable components of 
automated systems. Will (1991) also found that characteristics of automated agents, such as 
reliability, correlated with user trust in the system. Furthermore, the confidence of the user was 
shown to significantly impact how they interacted with the automation and the degree of trust 
instilled in it. 

Lee and Moray (1992) reported that trust in automation does affect the operators’ use of 
manual control if their trust is greater than their own self-confidence to perform the tasks. Riley 
(1994) identified self-confidence in one’s manual skills as an important factor in automated 
usage. Riley (1989) noted that tmst in the automation alone does not affect the decision to use 
automation, but rather a complex relationship involving tmst, self-confidence, workload, skill 
level, and other variables determine the “reliance” factor of using automation. 


Self-Efficacy and Complacency 

Crew “complacency” has often been implicated as a contributing factor in aviation 
accidents. The term has become more prominent with the increase in automation technology in 
modem cockpits. As a consequence, there has also been an increase in research to understand the 
nature of complacency and to identify countermeasures to its onset. Parasuraman, Molloy, and 
Singh (1993) noted that complacency arises from overconfidence in automation reliability. They 
found that operators missed “automation failures” when the automated system was highly 
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reliable. Riley (1996) reported that an operator’s decision to rely on automation may actually 
depend on a complex relationship between level of trust in the system, self-confidence, and other 
factors. Lee and Moray (1994) also found that trust in automation and self-confidence can 
influence decisions to use or not to use automation, but that there were large individual 
differences. The idea of individual differences was examined recently by Singh, Molloy, & 
Parasuraman (1993a). They reported a modest relationship between individual differences in 
complacency potential and energetic-arousal with automation-related monitoring inefficiency. 
Lee (1992) also conducted a number of studies examining these relationships and provided 
evidence that self-confidence, or self-efficacy, coupled with over estimations of automation 
reliability, operationally defined as trust in automation, can influence operator’s decision to rely 
on automation. 

Self-efficacy refers to expectations that people hold about their abilities to accomplish 
certain tasks. Bandura (1986) argued that decisions to undertake particular tasks depend upon 
whether or not they perceive themselves efficacious in performing those tasks. The stronger the 
operator’s self-efficacy, the longer they will persist and exert effort to accomplish the task 
(Garland et al., 1988). Studies have shown that people, with higher self-efficacy for tasks, 
perform better in those tasks compared to people with lower self-efficacy (Bandura, 1997). 
However, in the aviation context, conditions can arise in which self-efficacy and the concomitant 
overconfidence in one’s ability can impair performance. An example of this would be a pilot not 
off-loading tasks to automation during high workload situations because of his or her 
overconfidence in managing flight tasks. Therefore, the present study examined the effects of 
self-efficacy on automation use and complacency under high and low workload conditions. 


Objectives of Present Study 

The present study sought to further explore the effects of individual differences in 
automation use. Specifically, the generalizability of self-efficacy in monitoring performance and 
its relationship to automation-induced complacency was studied in addition to how the 
psychological construct describes the “use”, “disuse”, or “misuse” (Parasuraman & Riley, 1997) 
of automation. 

Thirty participants were randomly selected from a pool of subjects who had participated 
in past monitoring, supervisory control, or vigilance experiments and, therefore, had a known A’ 
score of perceptual sensitivity to critical event-to-noise ratio. Participants were equated across 
randomly assigned groups and then were asked to perform a 30-min vigilance task that required 
responses to critical events. There were 30 critical events (1 / minute) presented during the vigil 
and the event rate was 30 per minute. Afterwards, each participant was asked to complete task- 
specific self-efficacy and self-confidence questionnaires. There was no statistical difference 
between the participants in task performance (A’). These participants were then assigned to two 
experimental groups based on a median split of the self-efficacy questionnaires, which was an 
estimation of performance confidence in performing the task. All participants were asked to 
return after one week to perform a system monitoring, resource management, and tracking task 
from the Multiple Attribute Battery (Comstock & Amegard, 1992) under high reliable and low 
reliable conditions. 

The difficulty of the tasks was varied during each task run, and participants had the 
option to off-load the tracking task to the automation if needed. Performance, workload, and 
complacency measures were collected. It was hypothesized that participants who scored low in 
self-efficacy would be more likely to trust the automation and, therefore, exhibit complacent 
performance than participants who scored high in self-efficacy. However, because these 
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participants may be less willing to utilize automation because of the propensity to distrust 
automation in favor of manual skills, it was also expected that there would be a “double-edged 
sword” effect found. Finally, it was hypothesized that, under conditions of high mental workload, 
high self-efficacy participants would do significantly worse than the low self-efficacy participants 
who engaged automation to reduce the taskload when the task difficulty was increased. 

METHOD 


Participants 

Thirty participants (age 18 to 39) were subjects of the study. All participants had pilot 
experience ranging from 0 to 110 hours of flight hours. Of the 30 participants, six had Visual 
Flight Rules (VFR) pilot certifications and 20 were in various stages of obtaining a private pilots 
license. The remaining 4 participants had a significant amount of Microsoft © Flight Simulator 
experience. Although flight experience was not considered an essential prerequisite for inclusion 
in the experiment, all participants were given a short test to assess their level of pilot knowledge. 
The pilots were given experimental credit toward university coursework or given the monetary 
amount of $25. All participants had participated in previous vigilance studies and were randomly 
selected from a pool of eligible participants. 

Baseline Task 

The task was a 30-min simultaneous vigilance task that required participants to monitor 
the repetitive presentation of a pair of 3mm (W) X 38mm (H) white lines separated by 25mm. 
These lines appeared in the center of the monitor screen. The stimuli were white (shown black in 
Figure 1) and were presented on a blue background. Critical signals (targets) were 2mm taller 
than neutral events and occurred once a minute at random intervals. The event rate of 
presentation was 30 stimuli / minute. The participants were required to respond to the presence 
of the critical signals by pressing the space bar on the keyboard. The simultaneous vigilance task 
has been used in a number of studies and has been shown to be a valid task for inducing low 
vigilance states (Warm & Parasuraman, 1984). 
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Neutral Event 


Critical Event 


Figure 1. Baseline Vigilance Task (Not to Scale) 


Experimental Task 

Participants were run using a modified version of the NASA Multi-Attribute Task (MAT) 
battery (Comstock and Arnegard, 1992). The MAT battery is composed of four different task 
windows: tracking, system monitoring, communication and fuel management. These different 
tasks were designed to simulate the tasks that airplane crewmembers often perform during flight. 
Each of these tasks can be fully or partially automated. In the present study, only the tracking, 
monitoring, and resource management tasks were used. The monitoring task was the only task 
out of the three that was automated. The three tasks were displayed in separate windows of a 14- 
inch VGA color monitor. 

Tracking Task . A two-dimensional compensatory tracking task with joystick control is 
presented in one window of the display (see Figure 2). The task requires participants to use the 
joystick to maintain a moving circle, approximately 1 cm in diameter, centered on a .5 cm by .5 
cm cross located in the center of the window. Failure to control the circle results in its drifting 
away from the center cross. The tracking task uses a 4:3 horizontal-to-vertical sine wave driving 
function. The gain and difficulty levels were set at the default settings described in Comstock and 
Arnegard (1992). 

System Monitoring. The upper-left section of the MAT battery (Figure 2) shows the 
system monitoring task, which consists of four vertical gauges with moving pointers and green 
“OK” and red “Warning” lights. Normally, the green OK light is on and the pointers fluctuate 
around the center of each gauge. In each 10-min block of the task, 16 “system malfunctions” 
occurred at unpredictable intervals ranging from 13 to 72 sec. When a system malfunction 
occurred, the pointer on one of the four engine gauges went “off limits”. When the engine gauge 
went “off limits”, the pointer shifted its center position away from the center of the vertical 
gauge, independent of the other 3 pointers and at intervals according to a predefined script. 
According to the predefined script programmed into the MAT for each task mode, the monitoring 
task detected 14 out of the 16 malfunctions automatically for the high reliability task mode and 9 
out of the 16 malfunctions for the low reliability task mode. The red warning light came on and 
then went off when the automation had corrected a malfunction in 4 seconds, indicating 
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successful identification and correction of the malfunction. During this time, the participant’s 
response keys were disabled to prevent manual input. 

However, from time to time the automation failed to detect a malfunction. When the 
automation routine failed, the pointer changed its position from the center of the scale on one of 
the gauges independent of the other three gauges. However, the green OK light remained on and 
no red light appeared. The operator was responsible for detecting pointer shifts occurring on any 
of the four gauges, regardless of direction, and was required to respond by pressing one of the 
four function keys (FI, F2, F3, or F4) corresponding to the labels below each vertical gauge. 
Once the malfunction was detected, the pointer of the appropriate gauge moved immediately back 
to the center point and remained there without fluctuating for a period of 1.5 sec. (i.e. no 
malfunctions occurred during this time). If the participant failed to detect a malfunction, it was 
automatically corrected within 10 sec. 

If the participant responded appropriately to an automation failure by pressing the correct 
function key, the response was scored as a correct detection of an automation failure. If the 
participant failed to detect the failure within 10 sec, the gauge was reset and the response was 
scored as a miss. A detection error occurred if the operator detected an automation failure but 
incorrectly identified the gauge associated with the failure (e.g. pressing FI for a malfunction in 
engine 2). All other responses were classified as false alarms, making the performance measures 
for the system-monitoring task: (a) the probability of detection of automation failures, (b) reaction 
time (RT) for detection, and (c) the number of detection errors and false alarms made. 

Fuel Management. The fuel management task is displayed in the lower, right window 
of the MAT batter (Figure 2). It requires participants to maintain a specific level of fuel within 
both of the main tanks (A & B) by selectively activating pumps to keep pace with the fuel 
consumption in the tanks. The six rectangular regions represent the fuel tanks. The lines that 
connect the tanks are pumps that can transfer fuel from one tank to another in the direction 
indicated by the arrow. The numbers underneath the tanks represent the amount of fuel in gallons 
that each tank contains. This number is updated every two seconds. The maximum amount of 
fuel that can be in tank A or B is 4000 gallons and in tank C or D is 2000 gallons, the remaining 
two tanks have unlimited capacity. 

Participants were instructed to maintain fuel in tanks A and B at a tick mark that 
graphically depicts the level of 2500 gallons. The shaded region around the tick mark indicated 
acceptable performance. Tanks A and B were depleted at a rate of 800 gallons per minute and, 
therefore, to maintain an acceptable level of fuel, participants had to transfer fuel from one ta nk to 
another by activating one or more of the eight fuel pumps. Pressing the number key that 
corresponds to the pump number activates these pumps, and pressing it a second time turns it off. 
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Figure 2. The Multi-Attribute Task Battery 


Experimental Design 


A 2 (constant, variable reliability) X 2 (high, low self-efficacy) X 2 (high, low task 
difficulty) mixed design was used. The independent variables were reliability of the automation 
condition in the system-monitoring task, self-efficacy, and difficulty. Dependent variables were 
A’ performance, tracking RMS error, fuel level deviation, and NASA-TLX scores. The measures 
are discussed below under the heading of dependent measures. 

The automation reliability of the system-monitoring task was defined as the percentage of 
1 6 system malfunctions correctly detected by the automation routine in each 1 0-min block. The 
automation routine was varied as a between-subjects factor (Constant or Variable Reliability) and 
sessions (1-2 on consecutive days) and 10-min blocks (1-4) as within subject factors in the mixed 
factorial design. The reliability schedule for each condition that was employed by this study is the 
same one used by Singh et al. (1997). In the constant-reliability groups, the automation reliability 
was constant from block to block at 87.5% (14 out of 16 malfunctions detected by the 
automation) for each of the participants. This reliability level is used because complacency is 
most likely to result when working with highly reliable automated environments, in which the 
operator serves as a supervisory controller, monitoring system states for the occasional 
automation failure (Parasuraman et al., 1993). In the variable-reliability group, the automation 
reliability alternated every 10 min from low (9 out of 16 malfunctions detected by the automation 
or 56.25%) to high (87.5%) for half the participants and from high to low for the other half. No 
instructions about the reliability percentages of the automation were given to the participants 
other than the general instruction that the automation is not always reliable. 
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Dependent Measures 


RMSE. A global measure of task performance was obtained for each participant by 
computing the RMS error in fuel levels of tanks A and B (deviation from the required level of 
2500 gallons). Fuel levels were sampled and RMS errors computed for each 30-sec period; then 
they were averaged over a 10-min block to yield the RMS error for each block. Combined root- 
mean-square (RMS) errors were computed for samples collected over each 2-sec period and then 
averaged over a 10-min block to yield the mean RMS error for a given block. 

A’. A’ is a nonparametric measure of “perceptual sensitivity” or d’ (Wickens & 
Hollands, 2000) and has been used extensively in the supervisory control and vigilance literature 
(Warm & Parasuraman, 1984). Sensitivity refers to the separation of noise and signal 
distributions and reflects the theoretical index of where the two normally distributed curves 
intersect. The value of d’ is reflected by the degree to which the signal-to-noise ratio is high. 
The higher the ratio, the higher the value of d’ than if the two curves overlay each other. 
Wickens and Hollands (2000) describe the hypothetical distributions and theoretical foundations 
of signal detection theory and the measure of d\ d’ is often a theoretical value since the two 
curves cannot be obtained and plotted on a receiver operator characteristic (ROC). Therefore, the 
measure of A’ can be substituted by measuring the area under the ROC curve. This provides the 
advantage of being parameter free and not relying on assumptions or estimations of the shape or 
form of the signal and noise distributions. A’ can be obtained by the formula: A’ = [probability 
(hit) + [1 - probability (false alarm)]] / 2. A’ values range from .5 to 1 with 1 being perfect 
performance (i.e., perfect number of hits and no false alarms). 

NASA-TLX. The NASA-TLX (Hart & Staveland, 1 988) is a multi -dimensional measure 
of subjective workload. It requires the participant to complete a series of ratings on six 20-point 
scales (mental demand, physical demand, temporal demand, performance, effort, and frustration 
level). The “traditional” TLX scoring procedure combines the six scales, using paired 
comparison-derived weights, to provide a unitary index of workload. Byers, Bittner, and Hill 
(1989), however, demonstrated that a simple summation of responses on the six subscales 
produced comparable means and standard deviations, and that this “raw” procedure correlated 
between 0.96 to 0.98 with the paired comparison procedure. This study, therefore, summed the 
ratings of each scale, without the derived weighting, to provide an overall index of subjective 
workload for each participant. 


Experimental Procedure 

The thirty participants selected for participation in the present study had previous 
vigilance research experience. A’, a measure of perceptual sensitivity (Warm & Parasuraman, 
1 982), measures were collected and available a priori to selection. Each participant was given a 
self-efficacy and self-confidence questionnaire (Bandura, 1986), which measured his or her self- 
perceptions of confidence in being able to complete the vigilance task. Participants were not 
provided any knowledge-of-results on performance. 

All participants were matched for A’ and questionnaire scores put into equal groups from 
which to randomly pool. Approximately 1 2 weeks later, participants were notified that they were 
eligible to participate in the experiment. 75% of the 40 participants agreed to be part of the 
present study. The thirty participants were ranked into high and low self-efficacy experimental 
groups based on a median split of self-efficacy questionnaire scores. A’ scores were not 
significantly different across the groups (g> .05). However, because these A’ scores were based 
on previous vigilance task performance, all participants selected for participation in the study 
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were asked to perform in an additional vigilance session for 30-min using the bar-type vigilance 
baseline task described above. Again, no differences were found in A’ (p > .05) between the high 
and low self-efficacy groups. 

Participants were invited back one-week later to complete four 30-minute sessions over a 
two-day period. Each participant received a 10-minute baseline practice session for 
familiarization with the MAT after having been provided detailed instructions of the functionality 
and operation of the task battery. After the baseline session, the experimental trials began and 
lasted 30-minutes each and were randomly counterbalanced for high- and low-reliability 
conditions. Participants were informed that the system-monitoring task was automated, and that 
the fuel management and tracking tasks was manual. They were informed that the automation for 
the system reliability task is not 100% reliable and that they were required to supervise the 
automation in order to respond to any malfunctions that the automation failed to detect. 
Participants were instructed to focus on all three tasks equally and to perform on each to the best 
of their ability. Participants were required to return the following day to complete the 2 nd session 
(trials 3 and 4). There was no practice period for the second session. Two separate sessions were 
required because complacency has been found to be “more easily” induced under multiple 
sessions using a multiple-task environment (Parasuraman et al., 1993). After each experimental 
trial, the NASA-TLX was administered. 


RESULTS 

The data from the study was analyzed using a series of MANOVAs (multivariate analysis 
of variance) and ANOVAs (analysis of variance) statistical procedures. In all cases, alpha was 
set at .05 and was used to determine statistical significance. Only effects statistically significant 
from the MANOVA were subject to ANOVAs and are reported in the results. Expected mean 
squares were computed for all main effects and interactions and were used to determine error 
term for main effects (subjects (self-efficacy)) and interactions (reliability*subjects (self- 
efficacy)). Analysis of simple effects was used to examine significant interactions. Data is 
collapsed across experimental sessions because significant effects were not found for any 
dependent variable across the four 30-minute experimental sessions, p_ > .05. 


A’ 


An ANOVA procedure was performed on the A’ data for the main effects of self-efficacy 
(high, low) and reliability condition (variable, constant) and the interaction between self-efficacy 
and reliability condition. There was a main effect found for self-efficacy, F(l, 28) = 247.82, p< 
.0001; reliability, F(l, 28) = 74.11, p < .0001; and self-efficacy*reliability, F(l, 28) = 39.98, p < 
.0001. There were significant differences between high (0.89067) and low (0.73767) self- 
efficacy, and variable (0.856) and constant (0.77233) reliability conditions. Figure 3 graphically 
presents the self-efficacy*reliability interaction. Simple effects analysis determined that 
significant differences were between low self-efficacy, constant and all other self-efficacy, 
reliability combinations (3), p < .05. 
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Figure 3. Reliability*Self-Efficacy Interaction for A’ 


RMSE 

No significant differences were found for root-mean squared error for resource 
management tasks, p> .05. Overall, participants performed equally well regardless of self- 
efficacy (low = 48.90; high = 52.34) or reliability condition (constant = 49.34; 51.90). No further 
analyses were conducted on the dependent variable. 

NASA Task Load Index 

Byers, Bittner, & Hill (1991) showed that the NASA TLX could be analyzed using the 
raw scores, rather than paired comparison scaled scores, to compute workload ratings. The 
NASA-TLX was scored from a 0 to 100 range representing 6 subscales with 20 points. 
Participants rated the high workload condition (67.467) as significantly higher in workload than 
the low workload condition (53.200), F(l, 28) = 115.73, p< .0001, which validates the 
experimental manipulation of workload conditions. An ANOYA also revealed that, overall, high 
self-efficacy participants (62.667) rated the tasks as significantly higher in workload than low 
self-efficacy participants (58.00), F(l,28) = 12.38, p< . 05. 

The main effect for self-efficacy TLX ratings must be considered in perspective of the 
interaction effect found between self-efficacy and reliability, F(l,28) = 1120.90, p< .0001, and 
presented in Figure 4. A simple effects analysis found that high self-efficacy participants, under 
the high workload condition, rated the task as significantly higher in workload than the other 3 
combinations of self-efficacy and reliability (i.e., low self-efficacy / low workload; low self- 
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efficacy, high workload; high self-efficacy / low workload). Moreover, low self-efficacy 
participants rated the task significantly higher under the LOW workload condition than the high 
workload condition probably due to the off-loading of the tracking task to the automation dining 
the high workload condition. High self-efficacy (100%) did not take advantage of the option to 
off-load the automation dining this time whereas 87% of low self-efficacy participants did. 




2 

£ 



High / Constant High / Variable Low /Constant Low /Variable 


Figure 4. Reliability*Self-Efficacy Interaction for NASA-TLX 

DISCUSSION 

As predicted, participants with high self-efficacy did significantly better in both the 
constant and variable reliability conditions. Only participants with low self-efficacy were found 
to have suffered automation-induced complacency. However, these participants did significantly 
better than high self-efficacy participants during the high workload condition. Parasuraman, 
Molloy, and Singh (1993) reported that complacency arises only in multiple task situations with 
concomitant increases in workload. Therefore, offloading the task reduced the workload 
demands significantly enough, and may have freed up cognitive resources that allowed low self- 
efficacy participants to perform the systems monitoring task more effectively than high self- 
efficacy participants. These participants did not take the option of off-loading the tracking task to 
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the automation when given opportunity to do so. This is confirmed by the fact that low self- 
efficacy participants rated workload to be significantly lower in the high workload condition than 
high self-efficacy participants. 

The results of the study suggest that self-efficacy is an important moderator variable of 
whether an operator will succumb to automation-induced complacency. Having low self- 
efficacy, a reflection of one’s perception of ability, regardless of skill level, can set-up cognitive 
strategies that may increase the potential for succumbing to automation-induced complacency. 
However, high self-efficacy may serve as a double-edged sword in producing overconfidence in 
one’s ability that may limit other strategies, such as task offloading, for managing cognitive 
workload. Future research should extend these findings to actual pilot populations to determine 
the relationship of self-efficacy to automation-induced complacency and how self-efficacy can 
moderate the pilot-automation interaction in supervisory control environments. 
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